BUILDING AN EFFICIENT, SCALABLE, AND TRAINABLE PROBABILITY-AND-RULE- BASED PART-OF-SPEECH TAGGER OF HIGH ACCURACY by

نویسندگان

JIAYUN HAN

Michael Covington

Jiayun Han

Paula Schwanenflugel

Alexander Williams

Maureen Grasso

Xianchun Huang

Jing Han

چکیده

This project is aimed to build an efficient, scalable, portable, and trainable part-of-speech tagger. Using 98% of Penn Treebank-3 as the training data, it builds a raw tagger, using Bayes’ theorem, a hidden Markov model, and the Viterbi algorithm. After that, a reinforcement machine learning algorithm and contextual transformation rules were applied to increase the tagger’s accuracy. The tagger’s final accuracy on the testing data is 96.51% and its speed is about 26,000 words per second on a computer with two-gigabyte random access memory and two 3.00 GHz Pentium duo processors. The tagger’s portability and trainability are proved by the taggermaker’s success in building a new tagger out of a corpus that is annotated with the tagset different from that of Penn Treebank. INDEX WORDS: Part-of-Speech, Tagging, Markov Model, The Viterbi Algorithm, The Baysian Theorem, Machine Learning, Contextual rules, Natural Language Processing BUILDING AN EFFICIENT, SCALABLE, AND TRAINABLE PROBABILITY-AND-RULEBASED PART-OF-SPEECH TAGGER OF HIGH ACCURACY

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Active Incremental Recognition of Human Activities in a Streaming Context

Recognising human activities from streaming sources poses unique challenges to learning algorithms. Predictive models need to be scalable, incrementally trainable, and must remain bounded in size even when the data stream is arbitrarily long. In order to achieve high accuracy even in complex and dynamic environments methods should be also nonparametric, i.e., their structure should adapt in res...

متن کامل

The Hidden Information State Dialogue Manager: A Real-World POMDP-Based System

The Hidden Information State (HIS) Dialogue System is the first trainable and scalable implementation of a spoken dialog system based on the PartiallyObservable Markov-Decision-Process (POMDP) model of dialogue. The system responds to n-best output from the speech recogniser, maintains multiple concurrent dialogue state hypotheses, and provides a visual display showing how competing hypotheses ...

متن کامل

Trainable High Resolution Melt Curve Machine Learning Classifier for Large-Scale Reliable Genotyping of Sequence Variants

High resolution melt (HRM) is gaining considerable popularity as a simple and robust method for genotyping sequence variants. However, accurate genotyping of an unknown sample for which a large number of possible variants may exist will require an automated HRM curve identification method capable of comparing unknowns against a large cohort of known sequence variants. Herein, we describe a new ...

متن کامل

Trainable, Scalable Summarization Using Robust NLP and Machine Learning

We describe a trainable and scalable summarization system which utilizes features derived from information retrieval, inibrmation extraction, and NLP techniques and on-line resources. The system con> bines these features using a trainable feature combiner learned from summary examples through a machine learning algorithm. We demonstrate system scalability by reporting results on the best combin...

متن کامل

Trainable and Dynamic Computing: Error Backpropagation through Physical Media

Machine learning algorithms, and more in particular neural networks, arguably experience a revolution in terms of performance. Currently, the best systems we have for speech recognition, computer vision and similar problems are based on neural networks, trained using the half-century old backpropagation algorithm. Despite the fact that neural networks are a form of analog computers, they are st...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2009

BUILDING AN EFFICIENT, SCALABLE, AND TRAINABLE PROBABILITY-AND-RULE- BASED PART-OF-SPEECH TAGGER OF HIGH ACCURACY by

نویسندگان

چکیده

منابع مشابه

Active Incremental Recognition of Human Activities in a Streaming Context

The Hidden Information State Dialogue Manager: A Real-World POMDP-Based System

Trainable High Resolution Melt Curve Machine Learning Classifier for Large-Scale Reliable Genotyping of Sequence Variants

Trainable, Scalable Summarization Using Robust NLP and Machine Learning

Trainable and Dynamic Computing: Error Backpropagation through Physical Media

عنوان ژورنال:

اشتراک گذاری